Chat LLM Proxy API
Overview
The Chat LLM Proxy project provides a set of APIs for interacting with various language models. This documentation outlines the available endpoints, request and response formats, and example usage to help developers integrate with the API effectively.
Table of Contents
API Endpoints
Generate Answer
POST /api/generate_answer
Overview
The generate_answer
API endpoint is responsible for generating a response based on the provided input parameters, utilizing various language models. It processes the request and returns a structured response containing the generated answer along with relevant metadata.
Request Parameters
Parameter | Type | Description |
---|---|---|
request | GenerateAnswerRequest | The request object containing user input and configuration for generating the answer. |
is_session | bool | Indicates whether the request is part of an ongoing session. |
model_info | dict | A dictionary containing information about the model to be used for generating the answer. |
prompt | str | The prompt or question for which the answer is to be generated. |
tool_defns | list | A list of tool definitions that may be used in the answer generation process. |
all_tools | dict | A dictionary containing all available tools for the request. |
history_prompt | list | A list of previous prompts or messages in the conversation history. |
query | str | The query string that may influence the answer generation. |
gen_search_text | str | The generated search text that may be included in the response. |
generated_chat_id | int | A unique identifier for the generated chat session. |
api_key | str | The API key for authentication and authorization purposes. |
Response Format
The response from the generate_answer
API is a JSON object containing the following fields:
Field | Type | Description |
---|---|---|
response | str | The generated answer based on the input prompt. |
generated_search_text | str | The search text generated during the processing of the request. |
finish_reason | str | Indicates the reason for the completion of the response generation (e.g., "stop", "length"). |
Example Usage
Request
{
"request": {
"user_cred": {
"token": "user_token",
"client": {
"tenant_id": "tenant_id"
}
},
"task_process": {
"service": "service_name"
},
"user_chat": {
"query": "What is the capital of France?",
"kvp": {}
}
},
"is_session": true,
"model_info": {
"modelId": "azure",
"modelVersion": "v1",
"temperature": 0.7,
"max_tokens": 150
},
"prompt": "What is the capital of France?",
"tool_defns": [],
"all_tools": {},
"history_prompt": [],
"query": "What is the capital of France?",
"gen_search_text": "",
"generated_chat_id": 12345,
"api_key": "your_api_key"
}
Response
{
"response": "The capital of France is Paris.",
"generated_search_text": "",
"finish_reason": "stop"
}
Get Sources
POST /api/get_sources
Overview
The get_sources
function retrieves various sources of information based on the user's request and bot details. It processes the input request and returns a structured dictionary containing relevant data.
Request Parameters
The function accepts the following parameters:
- request (
GenerateAnswerRequest
): An object containing user request details, including user credentials and task process information. - bot_details (
dict
): A dictionary containing details about the bot, includingllm_data
,system_instruction_cache_key
, andnum_new_uploads
. - chat_hist (
list
): A list of previous chat messages that may influence the current request.
Response Format
The function returns a tuple containing:
-
sources (
dict
): A dictionary with the following keys:gen_search_text
: The generated search text based on the context.model_info
: Information about the model being used.examples
: Example responses from the bot.query
: The processed query string.all_input_texts
: A dictionary containing filtered input texts.llm_data
: The data related to the language model.history_prompt
: The prompt history for the conversation.num_new_uploads
: The number of new uploads associated with the request.
-
bool: A boolean indicating the success or failure of the operation.
Example Usage
from app.controller.chat_classes import GenerateAnswerRequest
# Create a request object
request = GenerateAnswerRequest(
user_cred=user_credentials,
task_process=task_process_info,
bot_config=bot_configuration,
user_chat=user_chat_info,
prev_context=previous_context
)
# Bot details
bot_details = {
"llm_data": llm_data,
"system_instruction_cache_key": "some_cache_key",
"num_new_uploads": 2
}
# Chat history
chat_hist = [
{"user_query": "What is the weather today?"},
{"user_query": "Tell me about the news."}
]
# Call the get_sources function
sources, success = get_sources(request, bot_details, chat_hist)
# Output the sources
print(sources)
Handle Continuation Request
Description
The handle_continuation_request
function processes incoming queries to determine if a continuation request is being made. Specifically, it checks if the query matches a predefined constant that indicates a request to continue from the last answer.
Parameters
- query (str): The incoming message payload to check. This is the user input that may indicate a continuation request.
Returns
- str: The processed query. If the input query matches the constant indicating continuation, it returns the string "continue". Otherwise, it returns the original query.
Example Usage
# Example of handling a continuation request
user_query = "$continue_answer"
processed_query = handle_continuation_request(user_query)
print(processed_query) # Output: "continue"
# Example with a regular query
user_query = "What is the weather today?"
processed_query = handle_continuation_request(user_query)
print(processed_query) # Output: "What is the weather today?"
Run Concurrently
Purpose
The run_concurrently
function is designed to execute two tasks concurrently using a thread pool. It allows for efficient processing of tasks that can be performed simultaneously, improving the overall performance of the application.
Request Parameters
The function accepts the following parameters:
- llm_data (dict): A dictionary containing data related to the language model, including any necessary configurations and inputs.
- token (str): The authentication token used to access secured resources.
- query (str): The query string that will be processed by the language model.
- service (str): The service identifier that specifies which language model service to use.
Return Values
The function returns a tuple containing:
- search_text_result (str): The result of the semantic search operation.
- extracted_text_result (str): The result of the text extraction operation.
Example Usage
llm_data = {
"context": "Sample context for processing.",
"upload_files": [],
# Additional necessary data...
}
token = "your_auth_token"
query = "What is the capital of France?"
service = "example_service"
search_text, extracted_text = run_concurrently(llm_data, token, query, service)
print("Search Text:", search_text)
print("Extracted Text:", extracted_text)
Stream Generate Answer
POST /api/stream_generate_answer
Overview
The stream_generate_answer
API endpoint is designed to generate answers in a streaming manner based on the provided request parameters. This allows for real-time interaction and response generation, making it suitable for applications that require immediate feedback.
Request Parameters
Parameter | Type | Description |
---|---|---|
request | GenerateAnswerRequest | The request object containing user credentials, query, and other necessary information. |
is_session | bool | Indicates whether the request is part of an ongoing session. |
model_info | dict | A dictionary containing information about the model being used, such as model ID and version. |
prompt | str | The prompt to be used for generating the answer. |
tool_defns | list | A list of tool definitions that may be used in the answer generation process. |
all_tools | dict | A dictionary containing all available tools for the request. |
history_prompt | list | A list of previous prompts to provide context for the current request. |
query | str | The query string that the model will respond to. |
gen_search_text | str | The generated search text that may be used in the response. |
generated_chat_id | int | A unique identifier for the generated chat session. |
api_key | str | The API key for authentication purposes. |
Response Format
The response from the stream_generate_answer
API is a stream of chunks, each containing the following structure:
Field | Type | Description |
---|---|---|
generated_search_text | str | The search text generated during the answer generation process. |
response | str | The generated answer from the model. |
finish_reason | str | Indicates the reason for finishing the response generation (e.g., "stop", "length"). |
Example Usage
Request Example
{
"request": {
"user_cred": {
"token": "your_token_here",
"client": {
"tenant_id": "your_tenant_id"
}
},
"user_chat": {
"query": "What is the capital of France?"
}
},
"is_session": true,
"model_info": {
"modelId": "azure",
"modelVersion": "v1"
},
"prompt": "Please provide the capital city of France.",
"tool_defns": [],
"all_tools": {},
"history_prompt": [],
"query": "What is the capital of France?",
"gen_search_text": "",
"generated_chat_id": 12345,
"api_key": "your_api_key_here"
}
Response Example
{
"generated_search_text": "The capital of France is Paris.",
"response": "The capital of France is Paris.",
"finish_reason": "stop"
}
Request Models Documentation
This document describes the data models used in the API requests for the chat LLM proxy. Each model outlines the structure and types of the request payloads.
GenerateAnswerRequest
Description
The GenerateAnswerRequest
model is used to encapsulate the data required to generate an answer from the chat LLM.
Properties
user_cred
(UserCred): Contains user credentials.bot_config
(BotConfig): Configuration settings for the bot.query
(string): The input query for which an answer is to be generated.task_process
(TaskProcess): Information about the task being processed.prev_context
(string, optional): Previous context for continuation requests.kvp
(dict): Key-value pairs for additional parameters.
Example
{
"user_cred": {
"token": "user_token",
"client": {
"tenant_id": "tenant_id"
}
},
"bot_config": {
"caller_version": "v6",
"tool_config": null
},
"query": "What is the capital of France?",
"task_process": {
"service": "chat_service"
},
"prev_context": null,
"kvp": {
"files": [
{
"file_name": "document1.txt",
"source_category": "InputFile"
}
]
}
}
Other Models
UserCred
token
(string): The authentication token for the user.client
(ClientInfo): Information about the client.
BotConfig
caller_version
(string): The version of the bot being called.tool_config
(ToolConfig, optional): Configuration for tools used by the bot.
TaskProcess
service
(string): The service being used for the task.
ClientInfo
tenant_id
(string): The tenant ID associated with the client.
ToolConfig
- (Define properties as needed based on your application requirements)